Face meta-data creation

ABSTRACT

A face meta-data creating technique in which the description length is short and which is used to extract the face feature for face recognition immune to the local error. An area cut-out section ( 121 ) cuts out a local area of a face image. Frequency feature extracting means ( 122 ) extracts the frequency spectrum of the local area. Vector projection means ( 123 ) projects the obtained frequency feature onto a partial space to extract the face feature of the local area. Aface meta-data unit ( 12 ) extracts the face features from local areas cut out in different positions, thus creating face features as face meta-data.

TECHNICAL FIELD

The present invention relates to a face recognition technique which canbe used in face recognition such as face identification, faceverification, facial expression recognition, sex classification based ona face, and age estimation based on a face and, more particularly, to ametadata generation unit and method and program for generating metadatarelated to face information projected as a still picture or movingpictures.

BACKGROUND ART

The metadata is typically the data describing or representing meaning ofthe data. In a case of face recognition, metadata means the dataregarding to the face data such as a still face picture or movingpictures.

As standardization activities of metadata for multimedia contents suchas video, pictures, and voice, activities of MPEG-7 (an internationalstandard for multimedia content description interface standardized byMPEG: Moving Pictures Experts Group, i.e., ISO/IEC JTC1/SC29/WG11) arewell known. Among the activities, a face recognition descriptor isproposed as a descriptor of metadata related to face recognition (A.Yamada et al, “MPEG-7 Visual part of eXperimental Model Version 9.0”,ISO/IEC JTC1/SC29/WG11 N3914, 2001).

In the face recognition descriptor, a face image clipped and normalizedis subjected to a kind of subspace methods that is generally called aneigenface. Specifically, a basis matrix for extracting a feature of theface image is obtained and, using the basis matrix, a facial feature isextracted from the image as metadata. In addition, it is proposed to usea weighted absolute distance as a measure of similarity for the facialfeature.

For the techniques related to face recognition, various methods areknown. For example, a face recognition technique using an eigenspacemethod based on principal component analysis (Moghaddam et al.,“Probabilistic Visual Learning for Object Representation”, IEEETransactions on Pattern Analysis and Machine Intelligence, Vol. 17, No.7, pp. 696-710, 1997) and that based on discriminant analysis (W. Zhaoet al., “Discriminant Analysis of Principal Components for FaceRecognition”, Proceedings of the IEEE Third International Conference onAutomatic Face and Gesture Recognition, pp. 336-341, 1998) are known. Inaddition, a face identification method using independent componentanalysis is known (Umeyama et al., “Kyoshi-tsuki Dokuritsu SeibunBunseki wo mochiita Kao no Dotei nitsuite [Face Identification UsingSupervised Independent Component Analysis]”, The Institute ofElectronics, Information and Communication Engineers, PRMU99-27, 1999).

On the other hand, Japanese Unexamined Patent Publication No. 5-20442and a document (Akamatsu et al., “Notan-gazo Macchingu niyoru RobasutonaShomen-gao no Shikibetsu Hoho—Fourier Supekutoru no KL Tenkai noOyo—[Robust Full Face Identification Using Gray Scale ImageMatching—Application of K-L Expansion of Fourier Spectrum—]”, IEICETransactions, J76-D-II, No. 7, pp. 1363-1373, 2001) each disclose a faceimage identification technique. According to this face imageidentification technique, a power spectrum of Fourier frequencies of aface image is subjected to the principal component analysis to obtain afacial feature, and face identification is performed using the obtainedface feature. The power spectrum, obtained by Fourier transform,exhibits properties that the power spectrum is not changed intranslation to derive a more favorable result than obtained by principalcomponent analysis using pixels of an image as feature vectors.

In addition, an image matching method for dividing an image into localimage areas to perform template matching is known (Saito, “BurokkuTohyo-shori niyoru Shahei ni Gankyona Tenpureito Macchingu [RobustTemplate Matching for Occlusion Using Vote by Block]”), IEICETransactions, Vol. J84-D-II, No. 10, pp. 2270-2279). According to thismethod, matching is performed every local area to obtain an evaluation.Evaluations of the respective local areas are accumulated to calculatethe evaluation of matching. Alternatively, evaluations of the respectivelocal areas are applied to a voting space to calculate the evaluation ofmatching.

However, in the known techniques, the principal component analysis orthe independent component analysis is performed using pixel valuesobtained by uniformly sampling the whole face image or a Fourierspectrum of the whole image as input features. Therefore, a matchingerror generated in a part of the image (for example, a matching errorcaused by masking or a fluctuation in orientation of a face) has aripple effect on vectors projected onto a subspace. Thus, the knowntechniques is disadvantageous in that the whole evaluation is influencedand identification accuracy is not increased. The reason is as follows.For example, when pixel features are subjected to the principalcomponent analysis, basis vectors are obtained. In many cases, elementsof the vectors have coefficients with respect to the whole pixels.Disadvantageously, feature vectors after projection are influenced by anerror generated in a part of the areas.

On the other hand, in template matching, an image is divided into localimage areas. Matching can be performed so as to absorb masking or thelike. However, the computational cost for block matching is large. It isa problem in practical application.

Therefore, it is an object of the present invention to provide a facemetadata generation technique in which a description length is short andthe computational cost for matching can be reduced.

Another object of the present invention is to provide a face metadatageneration technique which is capable of increasing the accuracy of facerecognition.

DISCLOUSRE OF INVENTION

The present invention provides a face metadata generating unit ofgenerating metadata related to face information of an image, the facemetadata generating unit including at least: area clipping means forclipping local areas of the image; frequency feature extracting meansfor extracting frequency features for the areas clipped by the areaclipping means; and vector projection means for projecting featurevectors, which are vectors consisting of the frequency featuresextracted by the frequency feature extracting means, onto predefinedsubspaces, thereby extracting the feature vectors projected onto aplurality of different local areas to generate the feature vectors asface metadata.

In the above-mentioned face metadata generating unit, preferably, thefrequency feature extracting means extracts power spectral intensitiesof Fourier frequencies, obtained by discrete Fourier transform, asfrequency features. Alternatively, the frequency feature extractingmeans extracts expansion coefficients, obtained by discrete cosinetransform or discrete sine transform, as frequency features.

Further, preferably, the vector projection means projects frequencyfeature vectors onto subspaces specified by basis vectors, which arepreviously obtained by the principal component analysis, discriminantanalysis, or independent component analysis for the frequency features,to calculate principal component vectors.

The area clipping means may search for area positions corresponding tothe respective local areas in the image to obtain clipping positions,and then clip the local areas.

Face metadata, extracted by the above-mentioned face metadata generatingunit, has a compact description length, which leads to the achievementof face image matching at high speed with high accuracy.

BRIEF DESCRIPTION OF THE DRAWING

FIG. 1 is a block diagram of the structure of a face image matchingsystem including a face metadata generating unit according to anembodiment of the present invention.

BEST MODE FOR CARRYING OUT THE INVENTION

To describe the present invention in more detail, the present inventionwill now be explained with reference to the accompanying drawing.

FIG. 1 is a block diagram of a face image matching system including aface metadata generating unit according to the present invention.

The face image matching system will now be described in detailhereinbelow with reference to FIG. 1.

As shown in FIG. 1, the face image matching system according to thepresent invention comprises a face image input unit 11 for inputting aface image, a face metadata generating unit 12 for generating facemetadata from a face image inputted by the face image input unit 11, aface metadata storage unit 13 for storing therein face metadatagenerated (extracted) by the face metadata generating unit 12, a facesimilarity calculating unit 14 for calculating a similarity of a facefrom the face metadata, a face image database 15 for storing the faceimages, a controller 16 for controlling, in response to a registrationrequest and a retrieval request of the image, input of the image,generation of the metadata, storing of the metadata, and calculation offace similarity, and a display unit 17 for displaying the face image andother information.

The face metadata generating unit 12 comprises an area clipping section121 for clipping local areas of the inputted face image, a frequencyfeature extracting section 122 for extracting frequency features for theclipped areas, and a vector projection section 123 for projectingfeature vectors, which are vectors consisting of the frequency features,onto subspaces to extract feature vectors. The face metadata generatingunit 12 extracts feature vectors in a plurality of different local areasto generate face metadata.

To register a face image, a face photograph or the like is inputtedusing the face image input unit 11 such as a scanner or a video camerasuch that the size and position of a face are adjusted. Alternatively,the face of a person can be inputted directly by the video camera. Inthis case, a face detection technique as described in theabove-mentioned document by Mohaddam may be used to detect a faceposition of an inputted image. Preferably, the size or the like of aface image may be automatically normalized.

An inputted face image is registered in the face image database 15 asnecessary. Simultaneously with the registration of the face image, facemetadata is generated by the face metadata generating unit 12. Thegenerated face metadata is stored to the face metadata storage unit 13.

In retrieval, as in the case of the registration, a face image isinputted by the face image input unit 11 and face metadata is thengenerated by the face metadata generating unit 12. The generated facemetadata is temporarily registered in the face metadata storage unit 13.Alternatively, the face metadata is directly transmitted to the facesimilarity calculating unit 14.

In the retrieval, to previously determine whether the inputted faceimage has already been stored in the database (face identification), theface similarity calculating unit 14 calculates a similarity between theinputted face image and each of data registered in the face metadatastorage unit 13. On the basis of the result of the highest similarity,the controller 16 selects a face image from the face image database 15and allows the display unit 17 to display the face image. The operatorverifies the matching between the retrieved image and the image to beregistered.

On the other hand, to previously determine whether a face imagespecified by an ID number or the like matches with a retrieved faceimage (face verification), the face similarity calculating unit 14calculates whether the retrieved image matches with the face imagespecified by the ID number. If a similarity therebetween is lower than apredetermined similarity, the controller 16 determines that there is nomatch. If the similarity therebetween is higher than the predeterminedsimilarity, the controller 16 determines that there is a match. Thecontroller 16 allows the display unit 17 to display the verificationresult. If this system is used for entrance control, instead ofproviding a visual indication of a face image, the controller 16transmits a control signal to an automatic door to control the automaticdoor. Thus, entrance can be controlled.

The face image matching system operates as mentioned above. Theabove-mentioned operation can also be realized on a computer system. Forexample, a metadata generation program for executing metadatageneration, as will be described in detail later, and a similaritycalculation program are stored in a memory. Those programs are executedby a program control processor. Thus, face image matching can berealized.

The operation of the face image matching system, particularly, theoperation of the face metadata generating unit 12 and that of the facesimilarity calculating unit 14 will now be described in detailhereinbelow.

(1) Face Metadata Generation

First, the operation of the face metadata generating unit 12 will bedescribed.

The face metadata generating unit 12 extracts a facial feature using animage I (x, y) whose position and size are normalized. For thenormalization of the position and size, for instance, the image may benormalized so that the positions of respective eyes are set to (32, 48)and (62, 48) and the size of the image corresponds to 92×112 pixels. Inthe following description, it is assumed that the image is normalized tothis size.

The area clipping section 121 then clips a plurality of previously setlocal areas of the face image. For example, the above-mentioned image isseparated into 42 (=M) local areas each having 16×16 pixels. Points in(x, y)=(15*I+8, 15*j+10) (I=0, 1, 2, . . . , 5; j=0, 1, 2, . . . , 6) atregular intervals are the centers of the respective local areas. First,the area clipping section 121 clips a local area s (x, y) as one area(i, j)=(0, 0).

In the above-mentioned clipping of the local areas, each local area isclipped at a predetermined position. Aface image is divided into partialareas serving as parts (the eyes, nose, mouth, and eyebrows) of a face,the partial areas are detected to find out area positions correspondingto the respective local areas from the face image, clipping positionsare corrected, and after that, the local areas are clipped. Thus,displacements of the respective parts caused by the orientation of theface can be corrected, which leads to extraction of more stabilizedfacial feature. For example, templates of local areas are formed on thebasis of an average face calculated from averages of inputted images.Each template is searched in the vicinity of a reference position(position in the average face may be used) for template search. Aclipping position is corrected on the basis of a template matchingposition. Then, a local area (partial area of the face) is clipped. Inthis template matching, normalized correlation is used.

In the above description, a template corresponds to a facial part. Localareas can be defined by uniform sampling as mentioned above.

As mentioned above, facial parts are held as templates and the positionsof respective templates are corrected, thus correcting displacements oflocal areas (facial parts) which cannot be corrected based on alignmentof the entire face and are caused by a change in attitude. After that,local features of the face are extracted. Consequently, the outputfeatures of the local areas can be stabilized, resulting in an increasein identification accuracy.

As another example of facial part detection, for example, a facial partextraction technique is disclosed in Japanese Unexamined PatentPublication No. 10-307923. According to this technique, facial parts canbe extracted.

The frequency feature extracting section 122 performs Fourier transformto the clipped local areas s(x, y) by two-dimensional discrete Fouriertransform and calculates power |S(u, v)| of an obtained Fourier spectrumS(u, v). A calculating method for obtaining a Fourier spectrum S(u, v)using discrete Fourier transform for a two-dimensional image is wellknown. For example, this method is explained in a document (byRosenfeld, “Dijitaru Gazo Shori (Digital Image Processing)”, pp. 20-26,Kindaikagaku Corporation). Accordingly, the description of this methodis omitted.

The two-dimensional Fourier power spectrum |S(u, v)| is obtained asmentioned above by transforming two-dimensional real components of theimage. Accordingly, the obtained Fourier frequency components aresymmetric. Therefore, the power spectrum |S(u, v)| has 256 components(u=0, 1, . . . , 15; v=0, 1, . . . , 15). 128 components as the halfcomponents (u=0, 1, . . . , 15; v=0, 1, . . . , 7) are substantially thesame as the other half components (u=0, 1, . . . , 15; v=8, 9, . . . ,15). The frequency feature extracting section 122 eliminates |S(0, 0)|as a DC component which is susceptible to a change in illumination andthen extracts a power spectrum of the other 127 components of the firsthalf components as frequency features.

Instead of Fourier transform, discrete cosine transform or discrete sinetransform can be used and expansion coefficients can be extracted asfrequency features. In the case using discrete cosine transform, animage is transformed such that coordinates in the origin of the imageare arranged at the center of the image. Thus, a feature can beextracted so that, particularly, asymmetric components of a face(particularly, right and left asymmetric components) are not extracted.In the case using discrete cosine transform or discrete sine transform,translation invariance is not always achieved as in Fourier power.Accordingly, the accuracy of alignment, previously performed, easilyaffects a result. Therefore, it is necessary to pay attention to thealignment.

Subsequently, the vector projection section 123 handles 127 facialfeatures, extracted as frequency features, as vectors. Partial areas arepredefined as follows. A face image set for training is prepared.Frequency feature vectors of clipped areas corresponding to the faceimage set are subjected to the principal component analysis, thusobtaining basis vectors (eigenvectors). Partial areas are predefined bythe basis vectors. A method for obtaining basis vectors is described invarious documents, for example, the above-mentioned document byMoghaddam and Japanese Unexamined Patent Publication No. 5-20442. Thismethod is generally well known. Accordingly, the description thereof isomitted. It is assumed that each basis vector includes N components(first to N-th principal components) in decreasing order of eigenvalues.For the N components, five components are enough. Original 256 featuredimensions can be compressed by a factor of 50. The reason is thatdimensional compression by the principal component analysis (K-Lexpansion) has high effects. Facial features can be described in compactsize. A subspace serving as a feature space is specified using those Nbasis vectors. However, the basis vectors are not normalized to unitvectors. For the basis vectors, the components of vectors are normalizedusing eigenvalues corresponding to respective eigenvectors and theresultant vectors are used as basis vectors.

In other words, assuming that a matrix having elements as basis vectorsserving as orthonormal basis is set to U, the component of each basisvector U_(k), serving as a unit vector of length 1, as one element ofthe matrix U is divided by the square root of the correspondingeigenvalue λ_(k). In this manner, the basis vectors are previouslytransformed. Consequently, the amount of matching operation usingMahalanobis distance can be reduced in identification, which will bedescribed later.

The above fact will now be described in more detail. It is assumed thattwo frequency feature vectors x₁ and x₂ are projected onto subspacesusing the orthonormal basis matrix U to obtain vectors y₁ and Y₂. Thus,y₁=Ux₁ and y₂=Ux₂. To measure a distance between two patterns usingMahalanobis distance, $\begin{matrix}\begin{matrix}{{d( {y_{1},y_{2}} )} = {\sum\limits_{k = 1}^{N}{{{y_{1,k} - y_{2,k}}}^{2}/\lambda_{k}}}} \\{= {\sum\limits_{k = 1}^{N}{{{y_{1,k}/\lambda_{k}^{1/2}} - {y_{2,k}/\lambda_{k}^{1/2}}}}^{2}}} \\{\sum\limits_{k = 1}^{N}{{{U_{k}{x_{1}/\lambda_{k}^{1/2}}} - {U_{k}{x_{2}/\lambda_{k}^{1/2}}}}}^{2}}\end{matrix} & (1)\end{matrix}$

In other words, if a basis vector U_(k)/λ_(k) ^(1/2), obtained bypreviously dividing the component by the eigenvalue, is used as a basisvector, Mahalanobis distance is the squared distance between a vectory₁′=(U_(k)/λ_(k) ^(1/2))x₁ and a vector y₂′=(U_(k)/λ_(k) ^(1/2))x₂,which are projected using the matrix. Thus, the amount of operation canbe reduced. Hitherto, in many cases, a mean vector has been drawn inorder to obtain a projection on subspaces. If a similarity is calculatedusing distances such as squared distances, feature vectors are merelyshifted with respect to the origin. Accordingly, drawing the mean vectoris not important so long as respective distances between feature vectorsand the corresponding vectors are uniformed.

In this manner, the vector projection section 123 can extract featurevectors projected on the N(=5)-dimensional subspaces. According to theabove-mentioned principal component analysis, features of an originalimage can be approximately represented with compact size in a smallnumber of dimensions. The representation of a facial feature with asmall number of dimensions results in a reduction in description lengthof metadata and an increase in matching speed.

The above description relates to the case where according to theprincipal component analysis, frequency vectors are projected ontosubspaces to extract a facial feature. In addition, as described in theforegoing document by Zhao, the discriminant analysis may be used andbasis vectors serving as feature components may be selected. In thiscase as well, according to the discriminant analysis, five basis vectorsare selected in a manner similar to the above description and theselected vectors are projected onto subspaces as in the case using theprincipal component analysis. So long as training data sets are enough,the discriminant analysis has higher accuracy than the principalcomponent analysis. Therefore, if enough training data sets arecollected, it is preferable to use the discriminant analysis. A methodfor selecting basis vectors is described in the foregoing document byZhao and is also well known. The detailed description thereof isomitted.

Similarly, the independent component analysis may be used as a methodfor selecting non-orthogonal basis vectors. Basis vectors may beselected by the independent component analysis. When basis vectors areselected by the independent component analysis, the basis vectors serveas non-orthogonal basis vectors. In this case, frequency feature vectorscan be projected onto subspaces selected similarly. The independentcomponent analysis is also well known. For example, it is disclosed inthe foregoing document by Umeyama et al. The detailed descriptionthereof is omitted.

When subspaces are selected by the discriminant analysis or theindependent component analysis, values corresponding to the eigenvalueλ_(k) used in the principal component analysis are calculatedseparately. Feature vectors projected on subspaces in a training set areused. The distribution thereof may be calculated every element of eachfeature vector. In this instance, obtaining a within-class distribution(corresponding to a distribution of observational errors) fromdifferences between elements of a person and those of the other one,which can be assumed to be identical with each other, exhibits higherperformance than using a distribution of elements of the entire trainingset (corresponding to a distribution of patterns, i.e., between-classdistribution). Therefore, it is preferable to normalize the basis matrixU using the within-class distribution.

The above-mentioned operation is performed every local area s(x, y), sothat the facial feature consisting of M (=42) vectors each having N (=5)elements can be obtained. The face metadata generating unit 12 generatesthe facial feature as face metadata for the inputted face image.

As mentioned above, the above-mentioned face metadata generationprocedure can be executed through a computer according to a computerprogram.

(2) Calculation of Face Similarity

The operation of the face similarity calculating unit 14 will now bedescribed.

The face similarity calculating unit 14 calculates a similarity d({y₁^(i)}, {y₂ ^(i)}) between two faces using M N-dimensional featurevectors {y₁ ^(i)} and {y₂ ^(i)} (i=1, 2, . . . , M). The M featurevectors are obtained from two face metadata.

For example, the similarity is calculated by the following expression(square distance). $\begin{matrix}{{d( {\{ y_{1}^{i} \},\{ y_{2}^{i} \}} )} = {\sum\limits_{i = 1}^{M}{w_{i}( {\sum\limits_{k = 1}^{N}{{y_{1,k}^{i} - y_{2,k}^{i}}}^{2}} )}}} & (2)\end{matrix}$

In this case, the distance serves as the Mahalanobis distance asmentioned above because a basis matrix is previously normalized byeigenvalues. Alternatively, the similarity can also be calculated by alinear combination of cosines of feature vectors to be compared. In thiscase, the similarity is expressed by the following expression.$\begin{matrix}{{d( {\{ y_{1}^{i} \},\{ y_{2}^{i} \}} )} = {\sum\limits_{i = 1}^{M}( {w_{i}{y_{1} \cdot {y_{2}/{y_{1}}}}{y_{2}}} )}} & (3)\end{matrix}$

Where, w_(i) denotes a weighting coefficient for each local area. Forexample, when μ_(i) denotes an average of similarities (the Mahalanobisdistances or cosines of vectors) between feature vectors every localarea i in respective face images to be identical with each other in atraining set prepared, the reciprocal 1/μ_(i) thereof can be used as theweighting coefficient w_(i).

Weighting is performed every area as mentioned above. Thus, a smallweight w_(i) is given to each unstable local area (local area in whichthe value of μ_(i) is large). As a local area is more effective, thearea has a more significant feature with a large weight w_(i).Reliability as a weight is given to each local area, thus realizingidentification with high accuracy.

In the use of distances, as a value is larger, a similarity is lower (alow similarity means that the faces do not look alike). In the use ofcosines, as a value is larger, a similarity is larger (a largesimilarity means that the faces look alike).

The above description relates to the case where one face image isregistered and retrieval is performed using this face image. When aplurality of images of the face of one person are registered andretrieval is performed using one face image, for example, a similaritycan be calculated using metadata of the respective registered faceimages.

Similarly, when a plurality of images of one face are registered andretrieval is performed using a plurality of images, an average ofsimilarities of combinations or the minimum value thereof is obtained tocalculate a similarity. Thus, a similarity with respect to one face datacan be calculated. This means that if moving pictures are regarded as aplurality of images, the matching system according to the presentinvention can be applied to face recognition in moving pictures.

As mentioned above, according to the present invention, a face image isseparated into a plurality of local areas, frequency features of aFourier frequency spectrum or the like for the respective clipped areasare extracted, the extracted frequency features are projected ontosubspaces according to a method such as the principal component analysisor the independent component analysis to obtain feature vectors, and thefeature vectors are generated as face metadata. Thus, the face metadatahas a compact description length and has characteristics that are stableto partial positional changes. Face recognition can be achieved at highspeed with high accuracy by using such face metadata.

1. A face metadata generating method (12) of generating metadata relatedto face information of an image, said face metadata generating methodcomprising: a step (121) of clipping a plurality of different localareas of said image; a step (122) of extracting frequency features forthe respective local areas; and a step (123) of projecting featurevectors, which are vectors consisting of said frequency features, ontopredefined subspaces; thereby extracting the projected feature vectorsof the respective local areas so as to generate the projected featurevectors as face metadata.
 2. The face metadata generating methodaccording to claim 1, wherein power spectral intensities of Fourierfrequencies obtained by discrete Fourier transform are extracted as saidfrequency features.
 3. The face metadata generating method according toclaim 1, wherein expansion coefficients obtained by discrete cosinetransform are extracted as said frequency features.
 4. The face metadatagenerating method according to claim 1, wherein expansion coefficientsobtained by discrete sine transform are extracted as said frequencyfeatures.
 5. The face metadata generating method according to claim 1,wherein said subspaces are specified by basis vectors previouslyobtained by principal component analysis for frequency features, andfrequency feature vectors are projected onto the specified subspaces tocalculate principal component vectors.
 6. The face metadata generatingmethod according to claim 1, wherein said subspaces are specified bybasis vectors previously obtained by independent component analysis forfrequency features, and frequency feature vectors are projected onto thespecified subspaces to calculate feature vectors.
 7. The face metadatagenerating method according to claim 1, wherein said subspaces arespecified by basis vectors previously obtained by discriminant analysisfor frequency features, and frequency feature vectors are projected ontothe specified subspaces to calculate feature vectors.
 8. The facemetadata generating method according to claim 1, wherein area positionscorresponding to the respective local areas are searched as said localareas in said image, clipping positions are obtained, and after that,the local areas are clipped.
 9. Aface metadata generating unit (12) ofgenerating metadata related to face information of an image, said facemetadata generating unit comprising at least: area clipping means (121)for clipping local areas of said image; frequency feature extractingmeans (122) for extracting frequency features for the areas clipped bysaid area clipping means; and vector projection means (123) forprojecting feature vectors, which are vectors consisting of thefrequency features extracted by said frequency feature extracting means,onto predefined subspaces; thereby extracting the projected featurevectors of a plurality of different local areas so as to generate theprojected feature vectors as face metadata.
 10. The face metadatagenerating unit according to claim 9, wherein said frequency featureextracting means (122) extracts power spectral intensities of Fourierfrequencies, obtained by discrete Fourier transform, as frequencyfeatures.
 11. The face metadata generating unit according to claim 9,wherein said frequency feature extracting means (122) extracts expansioncoefficients, obtained by discrete cosine transform, as frequencyfeatures.
 12. The face metadata generating unit according to claim 9,wherein said frequency feature extracting means (122) extracts expansioncoefficients, obtained by discrete sine transform, as frequencyfeatures.
 13. The face metadata generating unit according to claim 9,wherein said vector projection means (123) projects frequency featurevectors onto subspaces specified by basis vectors, which are previouslyobtained by principal component analysis for the frequency features, tocalculate principal component vectors.
 14. The face metadata generatingunit according to claim 9, wherein said vector projection means (123)projects frequency feature vectors onto subspaces specified by basisvectors, which are previously obtained by independent component analysisfor the frequency features, to calculate the feature vectors.
 15. Theface metadata generating unit according to claim 9, wherein said vectorprojection means (123) projects frequency feature vectors onto subspacesspecified by basis vectors, which are previously obtained bydiscriminant analysis for the frequency features, to calculate thefeature vectors.
 16. The face metadata generating unit according toclaim 9, wherein said area clipping means (121) searches for areapositions corresponding to the respective local areas in said image toobtain clipping positions, and then clips the local areas.
 17. A programmaking a computer generate metadata related to face information of animage, said program making said computer realize: a function (121) forclipping a plurality of different local areas of said image; a function(122) for extracting frequency features for the respective local areas;and a function (123) for projecting feature vectors, which are vectorsconsisting of said frequency features, onto predefined subspaces,thereby making said computer extract the projected feature vectors ofthe respective local areas and generate the projected feature vectors asface metadata.
 18. A face image matching system comprising a face imageinput unit (11) for inputting a face image, a face metadata generatingunit (12) for generating face metadata from an inputted face image, aface metadata storage unit (13) for storing generated face metadatatherein, a face similarity calculating unit (14) for calculating asimilarity of a face from said face metadata, a face image database (15)for storing said face images, a controller (16) for controlling, inresponse to a registration request and a retrieval request of the image,input of the image, generation of the metadata, storing of the metadata,and calculation of face similarity, and a display unit (17) fordisplaying the face image and other information, wherein said facemetadata generating unit (12) comprises: area clipping means (121) forclipping local areas of said face image; frequency feature extractingmeans (122) for extracting frequency features for the areas clipped bysaid area clipping means; and vector projection means (123) forprojecting feature vectors, which are vectors consisting of thefrequency features extracted by said frequency feature extracting means,onto predefined subspaces.